Search CORE

18 research outputs found

M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

Author: Bera Aniket
Bhattacharya Uttaran
Chandra Rohan
Manocha Dinesh
Mittal Trisha
Publication venue
Publication date: 22/11/2019
Field of study

We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a per-sample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82.7% on IEMOCAP and 89.0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

Author: Bera Aniket
Bhattacharya Uttaran
Chandra Rohan
Manocha Dinesh
Mittal Trisha
Randhavane Tanmay
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 28/10/2019
Field of study

We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of

2,177

human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 89% on E-Gait, which is 14 - 30% more accurate over prior methods

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications